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ABSTRACT 

GeneSD http://gene3d.biochem.ucl.ac.uk is a com- 
prehensive database of protein domain assign- 
ments for sequences from the major sequence 
databases. Domains are directly mapped from 
structures in the CATH database or predicted 
using a library of representative profile HMMs 
derived from CATH superfamilies. As previously 
described, GeneSD integrates many other protein 
family and function databases. These facilitate 
complex associations of molecular function, struc- 
ture and evolution. GeneSD now includes a domain 
functional family (FunFam) level below the homolo- 
gous superfamily level assignments. Additions have 
also been made to the interaction data. More signifi- 
cantly, to help with the visualization and interpret- 
ation of multi-genome scale data sets, we have 
developed a new, revamped website. Searching 
has been simplified with more sophisticated filtering 
of results, along with new tools based on Cytoscape 
Web, for visualizing protein-protein interaction 
networks, differences in domain composition 
between genomes and the taxonomic distribution 
of individual superfamilies. 

INTRODUCTION 

The Gene3D database (1) provides protein domain anno- 
tations for sequences from the major sequence databases 
Ensembl, UniProt and RefSeq (2-4). Proteins are gener- 
ally composed of one or more discrete independently 
folding units known as domains and the CATH 



database (5) uses a combination of manual curation and 
automated evidence gathering to generate a superfamily 
classification of such structures in the PDB (6). An 
accurate HMM and graph theory-based method, 
DomainFinder (Yeats, Redfern and Orengo, manuscript 
in revision), is used to identify and resolve the boundaries 
of predicted domains. The new release of GeneSD (vlO.2) 
provides over 16 million predicted domains from 2549 
CATH superfamiHes in 60% of approximately 15 
million scanned sequences. This is an increase of 5% in 
domain annotation coverage compared with our last 
review in NAR (1). GeneSD domain annotations are 
provided via the GeneSD website (http://geneSd 
.biochem.ucl.ac.uk), the CATH-GeneSD DAS (http:// 
geneSd.biochem.ucl.ac.uk/GeneSD/Das), RESTful web 
services at http://geneSd.biochem.ucl.ac.uk/WebServices/ 
(7) and InterPro (8). 

Protein domains, and distinct combinations of them, 
are considered the primary building blocks of protein 
function evolution. The assignment of domains to a 
protein can help identifying functionally important 
residues from distant homologues (9) provide mechanistic 
explanations for the effects of sequence polymorphisms 
(10) and enable the 'inheritance' of interactions from 
homologues (11,12). To enhance the domain annotations 
generated by GeneSD, we also integrate many other com- 
plementary data sources. These include molecular and 
pathway function annotations from GO (IS), taxonomic 
information from the NCBI (14) and drug targets from 
DrugBank (15). 

At least 10% of the superfamiHes in CATH are function- 
ally highly diverse. Since these superfamilies are also highly 
populated, accounting for more than half of the domain 
annotations in GeneSD, we have subdivided them into 
functionally coherent families, FunFams, derived using 
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an in-house protocol (16). Due to the well-established cor- 
relation between structural and functional similarity, we 
also include information on structurally similar clusters 
(SCs) of domains (17). These two new types of annotation 
have been designed to help homology-based function 
transfer and the study of function evolution, especially in 
the larger, functionally diverse superfamihes. 



SUMMARY OF PRINCIPAL CHANGES 

The major change in the latest release of GeneSD is the 
completely revamped website, which allows visualization 
of genome wide family distributions and allows biologists 
to easily review the known functional information for 
multiple proteins and examine their interaction partners 
in the context of domain famihes. It also allows genome 
researchers to identify over-represented protein families 
and their functions. Below, we describe the database 
updates and give some common use case scenarios. 

Database updates 

We have extended our previous sequence set (UniProtKb, 
RefSeq and Ensembl) and we now provide domain assign- 
ments for all of Ensembl-Protists, Ensembl-Plants, 
Ensembl-Fungi, Ensenibl-Metazoa, Ensembl-Bacteria 
and UniProt splice variants — a total of 14963 305 
protein sequences. For these, approximately 16 milhon 
domains were found, covering 60% of the sequences. 

FunFams are constructed using an in-house protocol 
(16), which involves profile-profile-based clustering of 
domain sequences in each superfamily to identify func- 
tional families. We have recently improved the speed 
and assignment quahty of the method so that it is 
possible to identify FunFams in all Gene3D superfamihes. 
After family identification, an HMM is built for each 
FunFam using HMMER (18), in conjunction with a 
model-specific bitscore threshold based on the score 
attained by the most remote member sequence. 

Structural Clusters (SCs) are also useful for suggesting 
functional annotations for uncharacterized sequences, as 
they group together relatives with significant structural 
similarity and therefore hkely to have related functions. 
SCs are generated by clustering highly similar CATH 
domain structures. Similarities are calculated using 
CATHEDRAL (19) and clustered with a threshold of 
5 A normalized root mean square deviation (RMSD). 
The clustering used in the work is based on an in-group 
clustering algorithm. Like complete-hnkage clustering, it 
is an agglomerative, hierarchical clustering method in 
which clusters are joined together if and only if their 
least similar members meet the cutoff (hence avoiding 
the chaining associated with single-linkage clustering). 
However, it differs slightly from complete-linkage cluster- 
ing in that the order in which clusters are joined is based 
on the most similar pairs rather than the least similar 
pairs. For each SC, an HMM is built again using 
HMMER (18), and is used to scan the sequences in the 
respective Gene3D superfamily. Both the SC and FunFam 
methods are still under active development by the CATH 



team and are expected to become better integrated in 
future releases. 

Other expansions to functional data sets include the 
addition of protein interactions from BIOGRID (20), 
Reactome (21) and DIP databases (22), to supplement 
the previous set which comprised of Intact (23), Mint 
(24) and HPRD (25). Combining interaction data sets is 
still an important step to maximize coverage of protein 
interaction networks. We have added in the abihty to 
show if a protein has a knockdown experiment from the 
popular and comprehensive Genome RNAi (26) database, 
and provide links to this database for more detailed 
information. 

A more interactive, graphical website 

The website has been re-implemented with many new 
features added and speed-ups provided. We have a new 
front page to the website providing the main search types 
that can be carried out. Below, we describe the several 
different types of pages available in the website and 
include example use case scenarios for each one. 

Protein views 

The sequence search box provides the ability to search 
with many types of identifiers. For several model and med- 
ically important organisms, there is an auto-complete 
function on gene names. There is also a taxonomic filter 
box with auto-complete for all Ensembl organisms in 
Gene3D. Searching from this box, the user can retrieve 
a set of proteins showing their multi-domain architectures 
(MDAs). The resulting page (Figure 1) shows some 
default information, with many potential additional anno- 
tations available that can be interactively displayed or 
hidden (Figure lA). Buttons are available for domain 
visualization options or to retrieve protein interactions 
to the displayed proteins (Figure IB). Tailoring of 
domain images is made possible by our update to the 
newer javascript domain graphics hbrary provided by 
Pfam (27). A common requirement for biologists is to 
filter by a given sequence motif (28). Entering amino 
acid motifs in the sequence filter search box, as plain 
text or regular expressions, filters the proteins displayed 
in the table. An extensive fist of such expressions can be 
found at the ELM (28) resource, for example. There is also 
a global search filter, where text is filtered on all columns. 

Another commonly desired search query is to retrieve 
all of the proteins from a genome that contain a particular 
domain superfamily or MDA of interest. This can be 
achieved using the superfamily search box on the front 
page of the website. From here, the user enters a CATH 
superfamily code or name and an organism taxon ID or 
scientific name to restrict the sequence results to the 
genome of interest. If multiple CATH codes are entered, 
the search returns only those genes containing aU of the 
domain superfamihes searched for, thereby providing a 
domain composition search. As an example of the use of 
search term screening, we consider the case of a biologist 
interested in further elucidating the function of their 
proteins from domain content. Searching for the genes 
known to be involved in late anaphase chromosome 
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Figure 1. Results of search for genes KIF22 and AURKB. More functional annotation is available from a dropdown menu (A). Proteins interacting 
with these proteins can be displayed (B). Clicking on a domain image provides more details of the domain and links to structure, sequence and 
taxonomic information (C). 



condensation (AURKB and KIF22), we see the domain 
architectures of the proteins by defauh (Figure IC). More 
functional information is available from a dropdown 
menu (Figure lA). Inspecting this GO annotation shows 
K1F22 to have both DNA and microtubule binding func- 
tions, which is consistent with the functions of its two 
domains and its possible mechanism in chromosome con- 
densation (29). 

Protein interaction data 

As the wealth of protein interaction data increases, it is 
common for researchers to analyse their proteins of 
interest as a part of a system of dynamic interactions. It 
is also known that a large portion of protein-protein inter- 
actions (PPls) are mediated by common domain pairings 
[see (12,13) for resources]. Gene3D integrates the major 
experimentally defined PPl databases to provide a com- 
prehensive network analysis tool. The network can be 
obtained from the front page in a search box that 
mirrors the protein search box, or from any set of 
protein results (Figure IB). The aim of this utility is not 
to provide protein interaction predictions, as this area is 
well served by popular resources such as STRING (30) 
and PIPs (31), but instead it focuses on complementing 
experimentally defined interactions with domain informa- 
tion and vice versa. 

The network is displayed using the Cytoscape Web ap- 
plication (32) (Figure 2). The proteins in the network can 
be analysed in terms of their domain content (Figure 2B) 



or other protein features. Since networks can quickly 
become unmanageably large for viewing, the PPI page 
offers many different filtering options. A summary is 
provided, giving statistics such as the superfaniily fre- 
quencies in the network. Various interactive options are 
available, such as to highhght the proteins in the network 
with a given domain, only display proteins in the table 
selected in the network, or select proteins in the network 
visible in the proteins table. Combining these options with 
the other filtering options provides a quick and powerful 
way to explore the data. 

As an example, we look at the KSHV virus, a medically 
highly important human pathogen. By searching with 
several known oncogenic proteins in KSHV (LANA, 
vIL-6, vIRF, ORF71, Kl and K12) (33), a set of 
human-KSHV interactions are retrieved (Figure 2). 
Several of these proteins have structural protein domains 
predicted (Figure 2B) and we can inspect the summary of 
superfamilies found in the network, some of which are 
involved in apoptotic pathways. KSHV is known to im- 
mortalize cell lines (34) and subvert the host cell's molecu- 
lar machinery. Such networks and domain annotation can 
provide helpful insights into this process. 

CATH superfamily pages 

A set of all 2549 CATH superfamiHes in Gene3D and 
associated information with links can be found at http:// 
gene3d.biochem.ucl.ac.uk/superfamily/. The page displays 
various categories of information including abundance, 
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Figure 2. Result of searching for oncogenic viral proteins from KSHV for protein interactions in human (using uniprot accessions Q9DUM3, 
Q5G851, Q5G850, Q76RF1, Q98823, Q77Q82, D3JNC0). The panel on the left shows the protein interactions. Clicking on an edge or node generates 
pop-ups with more information on the protein or interaction respectively (A). The tabs on the right (B) contain information on the proteins, 
superfamllies and interaction types in the network. 



Structural diversity, functional diversity and taxonomic 
distribution. More detailed information on an individual 
CATH superfamily can be obtained from the front page 
superfamily search, or by simply adding the CATH code 
to the end of the above URL. The breakdown of the 
superfamily into the functional families is shown automat- 
ically. Another tab shows structurally similar sets of 
domains (SCs) hkely to have related functions and from 
here it is possible to link through to CATH to see the struc- 
tural representative for this SC. A third tab shows the taxo- 
nomic distribution of the family in Ensembl genomes for 
the CATH superfamily. The data are presented in search- 
able tables which can also be saved as text files. 

Genome comparison 

With the advent of genome sequencing, several powerful 
tools have been developed for genome comparison, 
including looking at domain content (35). GeneSD now 
provides a comparison tool for visualizing such differ- 
ences. From the front search page, the user simply 
inputs the two organisms with Ensembl genomes to be 
compared and gets a summary of the most differing (in 
terms of frequency) CATH superfamllies and CATH 
superfamily combinations in genes between the two 
genomes (Figure 3), in a network representation. In this 
network, the nodes corresponding to superfamllies and the 



edges (hnks between nodes) indicate superfamllies that 
co-occur in the same gene. The sizes, colours and 
thicknesses of the nodes (superfamllies) and edges (super- 
family combinations) indicate how different the counts are 
between the organisms. Contex-specific help on the 
webpage provides a complete description for interpreting 
the results. 

As an example application, we can compare the 
non-pathogenic Escherichia coli K12 and the pathogenic 
strain 0157:H7 str.TW14359 (Figure 3). We can see an 
over-represented domain composition 4.10.470.10-3. 
40.420.10 highhghted. The context sensitive help for this 
page tells us that the red-line joining these families indi- 
cates that these domains co-occur in the same gene more 
frequently for the pathogen than non-pathogen. Clicking 
on the red-line joining the two superfamiUes generates 
a pop-up allowing the user to chck through to inspect 
the genes in the pathogen containing these domains 
(Figure 3A). 

Genome distribution 

Superfamllies range from the very ancient — such as the 
P-loop hydrolases, present in LUCA and found in every 
organism — to those that appear to have emerged more 
recently and are taxonomically restricted. Analysis of 
domain family genome distribution can be a powerful 
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Figure 3. Domain composition comparison between closely related pathogenic and non-pathogenic strains of E. coli. Links back to the source genes 
are available by a tooltip (A). The function and superfamily tabs on the right, provide the ability to filter by function to help drill down to the subset 
of interest (B). 



tool for evolutionary analyses (35), while superfamily 
phylogenetic profiling has proven to be an effective tool 
in identifying functionally linked proteins (36). We now 
provide a graphical view of a superfamllies genome distri- 
bution. Searching for the genome distribution from the 
front page with a superfamily generates (Figure 4) a tree 
showing a distribution of the superfamily among Ensembl 
genomes. Different measures of commonness of a super- 
family can be selected from the genome distribution search 
on the front page of the website. Alternatively as with 
many of the searches, RESTful alternatives exist, hence 
http://gene3d.biochem.ucl.ac.Uk/superfamily/2.60.40.790/ 
genome-distribution/size/number displays the number of 
genes with a given superfamily across genomes, while 
http://gene3d.biochem.ucl.ac.Uk/superfamily/2.60.40.790/ 
genome-distribution/size/rate displays the proportion of 
genes with this superfamily. Cytoscape Web functionality 
facilitates zooming into the region of interest (Figure 4B). 
Clicking on a node (Figure 4A) (species or species group) 
gives a more detailed breakdown of the domain counts 
and allows the retrieval of genes in the case of species. 

Data downloads 

Since the last Gene3D update a suite of web services 
have recently been developed for structural/ functional 
annotation. The services allow flexible downloading of 



assignments and associated annotations from Gene3D, 
and access to some of the computational tools used by 
Gene3D internally. The services are RESTful, meaning 
they can be easily accessed from UNIX command line, 
code or even a browser. For more information, see the 
recent NAR web services publication (8). 



DISCUSSION 

Gene3D is an evolving resource adapting to the rapidly 
emerging fields of molecular biology. In this update, we 
have developed tools such as the network analysis tab, 
allowing for protein interaction networks to be analysed 
within the context of their CATH domains. As new data 
sets become available, we will continue to integrate them 
into the database. Among other things, the novel FunFam 
assignments have been added to provide functional sub- 
divisions of the Gene3D superfamllies and future releases 
will exploit this resource further. As the FunFam devel- 
opers expand the tools available, we wiU continue to 
improve this aspect of Gene3D. The uniqueness of the 
Gene3D resource comes from its genome-wide CATH 
structural domain assignments. This update expands on 
this uniqueness, by providing more powerful analysis 
tools and integrating additional, complementary data 
sources. 
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Figure 4. Genome distribution of CATH superfamily SH3-Domains across EnsembI genomes. Wider and redder nodes correspond to higher relative 
frequencies of the superfamily. Node tooltips (A) provide detailed information and links for gene retrieval. Zooming/panning functionality is also 
available (B). 



FUNDING 

IMI, EU (to J.L.); Wellcome Trust (to J.P. I.S.); EU 
Impact (to C.Y.); BBSRC (to R.R.); National Institutes 
of Health (to B.H.D.). Funding for open access charge: 
The Wellcome Trust. 

Conflict of interest statement. None declared. 



REFERENCES 

1. LeesJ., Yeats,C., Redfern.O., Clegg,A. and Orengo.C. (2010) 
Gene3D: merging structure and function for a Thousand 
genomes. Nucleic Acids Res., 38, D296-D300. 

2. Kersey,P.J., Lawson,D., Birney,E., Derwent,P.S., Haimel,M., 
HerreroJ., Keenan,S., Kerhornou,A., Koscielny,G., Kahari.A. 
et al. (2010) EnsembI Genomes: extending EnsembI across the 
taxonomic space. Nucleic Acids Res., 38, D563-D569. 

3. Boutet.E., Lieberherr,D., Tognolli,M., Schneider,M. and 
Bairoch,A. (2007) UniProtKB/Swiss-Prot. Methods Mol. Biol., 
406, 89-112. 

4. Pruitt,K.D., Tatusova,T., Klimke,W. and Maglott,D.R. (2009) 
NCBI Reference Sequences: current status, policy and new 
initiatives. Nucleic Acids Res., 37, D32-D36. 

5. Cuff,A.L., Silhtoe,!., Lewis,T., Clegg,A.B., Rentzsch,R., 
Furnham,N., Pellegrini-Calace,M., Jones,D., Thornton,J. and 
Orengo.C.A. (2011) Extending CATH: increasing coverage of the 
protein structure universe and linking structure with function. 
Nucleic Acids Res.. 39, D420-D426. 

6. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., 
Weissig,H., Shindyalov,I.N. and Bourne.P.E. (2000) The Protein 
Data Bank. Nucleic Acids Res., 28, 235-242. 

7. Yeats,C., Lees,J., Carter,P., Sillitoe,!. and Orengo,C. (2011) The 
Gene3D Web Services: a platform for identifying, annotating 
and comparing structural domains in protein sequences. 
Nucleic Acids Res., 39, W546-W550. 

8. Hunter,S., Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., 
BinnSjD., Bork,P., Das,U., Daugherty,L., Duquenne,L. et al. 



(2009) InterPro: the integrative protein signature database. 
Nucleic Acids Res., 37, D211-D215. 
9. Redfern,O.C., Dessailly,B.H., Dallman,T.J., Silhtoe,!. and 
Orengo,C.A. (2009) FLORA: a novel method to predict protein 
function from structure in diverse superfamilies. PLoS Comput. 
Biol, 5, el 00048 5. 

10. Izarzugaza,J.M., Baresic,A., McMillan,L.E., Yeats, C, Clegg,A.B., 
Orengo,C.A., Martin,A.C. and Valencia,A. (2009) An integrated 
approach to the interpretation of single amino acid 
polymorphisms within the framework of CATH and Gene3D. 
BMC Bioinformatics, 10(Suppl. 8), S5. 

11. Luo,Q., PagelP., Vilne.B. and Frishman,D. (2011) DIMA 3.0: 
Domain Interaction Map. Nucleic Acids Res., 39, D724-D729. 

12. Bjorkholm.P. and Sonnhammer.E.L. (2009) Comparative analysis 
and unification of domain-domain interaction networks. 
Bioinfortnatics, 25, 3020-3025. 

13. Ashburner,M., BallC.A., Blake,J.A., Botstein,D., Butler,H., 
Cherry,J.M., Davis,A.P., Dohnski,K., Dwight,S.S., Eppig,J.T. 
et al. (2000) Gene ontology: tool for the unification of biology. 
The Gene Ontology Consortium. Nat. Genet., 25, 25-29. 

14. Sayers,E.W., Barrett.T., Benson,D.A., Bryant,S.H., Canese,K., 
Chetvernin,V., Church,D.M., DiCuccio,M., Edgar,R., Federhen,S. 
et al. (2009) Database resources of the National Center for 
Biotechnology Information. Nucleic Acids Res., 37, D5-D15. 

15. Knox,C., Law,V.. Jewison,T., Liu,P., Ly,S., Frolkis,A., Pon,A., 
Banco,K., Mak,C., Neveu,V. et al. (2011) DrugBank 3.0: a 
comprehensive resource for 'omics' research on drugs. 
Nucleic Acids Res., 39, D1035-D1041. 

16. Lee.D.A., Rentzsch,R. and Orengo,C. (2010) GeMMA: functional 
subfamily classification within superfamilies of predicted protein 
structural domains. Nucleic Acids Res., 38, 720-737. 

17. Cuff,A., Redfern,O.C., Greene,L., Silhtoe,!., Lewis,T., Dibley,M., 
Reid,A., Pearl,F., Dallman,T., Todd,A. et al. (2009) The CATH 
hierarchy revisited-structural divergence in domain superfamilies 
and the continuity of fold space. Structure, 17, 1051-1062. 

18. Eddy,S.R. (2009) A new generation of homology search tools 
based on probabilistic inference. Genome Inforrn, 23, 205-211. 

19. Redfern,O.C., Harrison,A., Dallman,T., Pearl,F.M. and 
Orengo,C.A. (2007) CATHEDRAL: a fast and effective algorithm 



Nucleic Acids Research, 2012, Vol. 40, Database issue D471 



to predict folds and domain boundaries from multidomain 
protein structures. PLoS Compul. Bio!.. 3, e232. 

20. Breitkreutz,B.J., Stark, C, Reguly,T., Bouclier.L., Breitkreutz,A., 
Livstone,M., Ouglitred,R., Lackner,D.H., BahlerJ., Wood,V. 

et a!. (2008) The BioGRID Interaction Database: 2008 update. 
Nucleic Acids Re.s., 36, D637-D640. 

21. Matthews, L., Gopinath,G., Gillespie,M., Caudy,M., Croft,D., 
de Bono,B., Garapati,P., Hemish,J., Hermjakob,H., Jassal,B. 
et a!. (2009) Reactome knowledgebase of human biological 
pathways and processes. Nucleic Acick Res., 37, D619-D622. 

22. Salwinski,L., Miller,C.S., Smith,A.J., Pettit,F.K., Bowie,J.U. and 
Eisenberg,D. (2004) The Database of Interacting Proteins: 2004 
update. Nucleic Acids Re.s., 32, D449-D451. 

23. Aranda,B., Achutlian,P., Alam-Faruque,Y., Armean,I., Bridge,A., 
Derow,C., Feuermann.M., Ghanbarian,A.T., Kerrien,S., 
Khadake,J. et al. (2010) The IntAct molecular interaction 
database in 2010. Nucleic Acid.s Res., 38, D525-D531. 

24. Chatr-aryamontri,A., Ceol,A., Palazzi,L.M., Nardelh,G., 
Schneider,M.V., Castagnoh,L. and Cesareni,G. (2007) MINT: 
the Molecular INTeraction database. Nucleic Acids Res., 35, 
D572-D574. 

25. Keshava Prasad,T.S., Goel,R., Kandasamy,K., Keerthikumar,S., 
Kumar,S., Mathivanan,S., Telikicherla,D., Raju,R., Shafreen,B., 
Venugopal,A. et al. (2009) Human Protein Reference Database- 
2009 update. Nucleic Acids Res., 37, D767-D772. 

26. GilsdorfM., Horn,T., Arziman,Z., Pelz,0., Kiner,E. and 
Boutros.M. (2010) GenomeRNAi: a database for cell-based RNAi 
phenotypes. 2009 update. Nucleic Acids Res., 38, D448-D452. 

27. Finn.R.D., Mistry,J., Tate,J., CoggilfP., Heger,A., Pollington,J.E., 
Gavin,O.L., Gunasekaran,P., Ceric,G., Forslund,K. et al. (2010) 
The Pfam protein families database. Nucleic Acids Res., 38, 
D211-D222. 

28. Gould,C.M., Diella,F., Via,A., Puntervoll,P., Gemund,C., 
Chabanis-Davidson,S., Michael,S., Sayadi,A., Bryne,J.C., Chica,C. 



et al. (2010) ELM: the status of the 2010 eukaryotic linear motif 
resource. Nucleic Acids Res., 38, D167-D180. 

29. Mora-Bermudez,F., Gerlich,D. and Ellenberg,J. (2007) Maximal 
chromosome compaction occurs by axial shortening in anaphase 
and depends on Aurora kinase. Nat. Cell. Biol., 9, 822-831. 

30. Szklarczyk,D., Franceschini,A., Kuhn,M., Simonovic,M., Roth,A., 
Minguez,P., Doerks,T., Stark,M., Muller,J., Bork,P. et al. (2011) 
The STRING database in 2011: functional interaction networks 
of proteins, globally integrated and scored. Nucleic Acids Res., 39, 
D561-D568. 

31. McDowall,M.D., Scott,M.S. and Barton,G.J. (2009) PIPs: human 
protein-protein interaction prediction database. Nucleic Acids 
Res., 37, D651-D656. 

32. Lopes,C.T., Franz,M., Kazi,F., Donaldson,S.L., Morris,Q. and 
Bader,G.D. (2010) Cytoscape Web: an interactive web-based 
network browser. Bioinformatics, 26, 2347-2348. 

33. Wen,K.W. and Damania,B. (2010) Kaposi sarcoma-associated 
herpes virus (KSHV): molecular biology and oncogenesis. 
Cancer Lett., 289, 140-150. 

34. Jenner,R.G., Maillard,K., Cattini,N., Weiss,R.A., Boshoff,C., 
Wooster,R. and Kellam,P. (2003) Kaposi's sarcoma-associated 
herpesvirus-infected primary effusion lymphoma has a plasma cell 
gene expression profile. Proc. Natl Acad. Sci. USA, 100, 
10399-10404. 

35. Wilson,D., Pethica,R., Zhou,Y., Talbot,C., Vogel,C., Madera,M., 
Chothia,C. and Gough,J. (2009) SUPERFAMILY— sophisticated 
comparative genomics, data mining, visualization and phylogeny. 
Nucleic Acids Res., 37, D380-D386. 

36. Ranea,J.A., Yeats,C., Grant,A. and Orengo,C.A. (2007) 
Predicting protein function with hierarchical phylogenetic profiles: 
the Gene3D Phylo-Tuner method applied to eukaryotic genomes. 
PLoS Comput. Biol, 3, e237. 



